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Abstract — Many tasks in robotics can be decomposed into 
sub-tasks that are performed simultaneously. In many cases, 
these sub- tasks cannot all be achieved jointly and a prioritiza- 
tion of such sub-tasks is required to resolve this issue. In this 
paper, we discuss a novel learning approach that allows to learn 
a prioritized control law built on a set of sub-tasks represented 
by motor primitives. The primitives are executed simultaneously 
but have different priorities. Primitives of higher priority can 
override the commands of the conflicting lower priority ones. 
The dominance structure of these primitives has a significant 
impact on the performance of the prioritized control law. We 
evaluate the proposed approach with a ball bouncing task on 
a Barrett WAM. 

I. INTRODUCTION 

When learning a new skill, it is often easier to practice the 
required sub-tasks separately and later on combine them to 
perform the task - instead of attempting to learn the complete 
skill as a whole. For example, in sports sub-tasks can often 
be trained separately. Individual skills required in the sport 
are trained in isolation to improve the overall performance, 
e.g., in volleyball a serve can be trained without playing the 
whole game. 

Sub-tasks often have to be performed simultaneously and 
it is not always possible to completely fulfill all at once. 
Hence, the sub-tasks need to be prioritized. An intuitive 
example for this kind of prioritizing sub-tasks happens during 
a volleyball game: a player considers hitting the ball (and 
hence avoiding it touching the ground and his team loosing a 
point) more important than locating a team mate and playing 
the ball precisely to him. The player will attempt to fulfill 
both sub-tasks. If this is not possible it is often better to 
"save" the ball with a high hit and hope that another player 
recovers it rather than immediately loosing a point. 

In this paper, we learn different sub-tasks that are rep- 
resented by motor primitives that combined can perform 
a more complicated task. For doing so, we will stack 
controls corresponding to different primitives that represent 
movements in task space. These primitives are assigned 
different priorities and the motor commands corresponding 
to primitives with higher priorities can override the motor 
commands of lower priority ones. The proposed approach is 



outlined in Sect. L-A and further developed in Sect. fflU We 
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Fig. 1: This figure illustrates the ball-bouncing task on a 
Barrett WAM. The goal is to keep the ball bouncing on the 
racket. 



evaluate our approach with a ball-bouncing task (see Fig. [T] 
and Sect. |lv| ). 

As the sub-tasks describe the movements in task space, 
we have to learn a control that is mapping to the robot 
joint space. Unfortunately, this mapping is not a well-defined 
function for many robots. For example, if the considered task 
space has fewer degrees of freedom than the robot, multiple 
solutions are possible. This redundancy can be resolved by 
introducing a null-space control, i.e., a behavior that operates 
on the redundant degrees of freedom. Such a null- space 
control can for example pull the robot towards a rest posture 
[ 1 ], prevent getting close to joint limits |2], avoid obstacles 
l3l or singularities |4|. Computing the task space control 
often corresponds to an optimization problem, that can for 
example be solved by a gradient based approach. A well 
known approach is the pseudo-inverse solution d 0. An 
alternative is to learn an operational space control law that 
implicitly includes the null-space behavior [5 ]. Once learned, 
it corresponds to a unique mapping from desired actions in 
operational space to required actions in joint space. 

The problem studied in this paper is related to hierarchical 
control problems as discussed in [6]. Using prioritized prim- 
itives in classical control has been explored in [7 ] by using 
analytical projections into the null- space. In this paper, we 
propose a learning approach that does not require complete 
knowledge of the system, the constraints, and the task. In 
the reinforcement learning community, the compositions of 
options (i.e., concurrent options), which is related to the 
concurrent execution of primitives, has been studied l8l . 
Learning null- space control has been explored in 0. In 
contrast, we do not attempt to recover the implicit null- space 
policy but build a hierarchical operational space control law 



from user demonstrated primitives. 

A. Proposed Approach 

Based on the observation that many tasks can be described 
as a superposition of sub-tasks, we want to have a set of 
controls that can be executed simultaneously. As a represen- 
tation for the sub-tasks, we chose the dynamical systems 
motor primitives, which are discussed in more detail in 
Section [II] Such primitives are well suited as representation 
for the sub-tasks as they ensure the stability of the movement 
generation. They are invariant under transformations of the 
initial position and velocity, the final position and velocity, 
the duration as well as the movement amplitude. 

In this paper, these primitives are described in different 
task spaces, e.g., in the form 

X» = 7Ti(Xi,Xi,z) 

where z denotes a shared canonical system while are 
positions in task- space i. For example, if we have a primitive 
"move end-effector up and down" its task space would 
correspond to the Cartesian position indicating the height 
(as well as the corresponding velocities and accelerations) 
but not include the sideways movement or the orientation of 
the end-effector. The dynamical systems motor primitives are 
well suited to represent different kinds of vertical movements 
starting and ending at various states and of different duration. 
These primitives are prioritized such that 

i y i — 1, 

which reads a "task i dominates task i — 1". If both sub-tasks 
can be fulfilled at the same time, our system will do so - 
but if this should not be possible, sub-task i will be fulfilled 
at the expense of sub-task i — 1. We attempt to reproduce a 
complex task that consists of several sub-tasks, represented 
by motor primitives, 

{7ri,7T2,, • • • ,7Tiv} 

that are concurrently executed at the same time following the 
prioritization scheme 

7V^7V-1^---^2^1. 

This approach requires a prioritized control law that com- 
poses the motor command out of the primitives 7^, i.e., 

u f(7Ti,7r 2 ,, . . . ,7rjv,q, q) 

where q, q are the joint position and joint velocity, u are the 
generated motor commands (torques or accelerations). 

We try to acquire the prioritized control law in three steps, 
which we will illustrate with the ball-bouncing task: 

1) We observe (t) , x^ (t) , x^ (£) individually for each 
of the primitives that will be used for the task. For 
the ball-bouncing example, we may have the follow- 
ing sub-tasks: "move under the ball", "hit the ball", 
and "change racket orientation". The training data is 
collected by executing only one primitive at a time 
without considering the global strategy, e.g., for the 



"change racket orientation" primitive by keeping the 
position of the racket fixed and only changing its 
orientation without a ball being present. This training 
data is used to acquire the task by imitation learning 
under the assumption that these tasks did not need to 
overrule each other in the demonstration (Sect. |ITI| ). 

2) We enumerate all possible dominance structures and 
learn a prioritized control law for each dominance list 
that fuses the motor primitives. For the three ball- 
bouncing primitives there are six possible orders, as 
listed in Table [H 

3) We choose the most successful of these approaches. 
The activation and adaptation of the different primi- 



tives is handled by a strategy layer (Sect. |IV-B[ ). In the 
ball-bouncing task, we evaluate how long each of the 
prioritized control laws keeps the ball in the air and 



pick the best performing one (Sect. IV-C). 
Clearly, enumerating all possible dominance structures only 
works for small systems (as the number of possibilities grows 
with n\, i.e., exponentially fast). 

II. BACKGROUND: MOTOR PRIMITIVES 

While the original formulation in |10) for discrete dynam- 
ical systems motor primitives used a second-order system to 
represent the phase z of the movement, this formulation has 
proven to be unnecessarily complicated in practice. Since 
then, it has been simplified and, in [11], it was shown that 
a single first order system suffices 



-ra z z. 



(1) 



This canonical system has the time constant r = 1/T where 
T is the duration of the motor primitive, a parameter a z 
which is chosen such that z ~ at T to ensure that the 
influence of the transformation function, shown in Eq. ([3]), 
vanishes. Subsequently, the internal state y of a second 
system is chosen such that positions x of all degrees of 
freedom are given by x = y 1? the velocities x by x = 
r Y2 = Yi an d the accelerations x by x = ry 2 . Under 
these assumptions, the learned dynamics of Ijspeert motor 
primitives can be expressed in the following form 



y 2 = ra y (f3 y (g - y x ) - y 2 ) + rAf (z) , 
Yi = ry 2 . 



(2) 



This set of differential equations has the same time con- 
stant r as the canonical system, parameters a y , f3 y set 
such that the system is critically damped, a goal parameter 
g, a transformation function f and an amplitude matrix 
A = diag(ai, <22, . . . , a n ), with the amplitude modifier a = 
[ai, a 2 , . . . , a n }. In 1 11 ], they use a = g — yj with the initial 
position y^, which ensures linear scaling. Alternative choices 
are possibly better suited for specific tasks, see e.g., 1 12]. The 
transformation function f (z) alters the output of the first 
system, in Eq. ([I]), so that the second system, in Eq. ([2]), can 
represent complex nonlinear patterns and it is given by 



f 0) = E*=i^i (z)WiZ. 



(3) 



Here, Wj contains the i th adjustable parameter of all degrees 
of freedom, N is the number of parameters per degree 
of freedom, and ipi(z) are the corresponding weighting 
functions [11]. Normalized Gaussian kernels are used as 
weighting functions given by 



0) 



exp 



(-hi (z - cif^j 



E^Li^xp [~hj(z-Cjf 



These weighting functions localize the interaction in phase 
space using the centers and widths hi. Note that the 
degrees of freedom (DoF) are usually all modeled as in- 
dependent in Eq. All DoFs are synchronous as the 
dynamical systems for all DoFs start at the same time, 
have the same duration, and the shape of the movement 
is generated using the transformation f (z) in Eq. This 
transformation function is learned as a function of the shared 
canonical system in Eq. ([I]). 

The original formulation assumes that the goal velocity is 
zero. Clearly this behavior is undesirable for hitting the balls 
in the ball-bouncing task. In [ 13 ], we proposed a modification 
that allows to specify arbitrary goal velocities: 

y 2 = (l-z)ra g (p g (^ - ^ ' (g " yi) 



rAf 



yi = T Y2> 



ln(s) 

g ; 



where g is the desired final velocity, g m is the moving goal 
and the initial position of the moving goal = g — rg 
ensures that g m (T) = g. The term — In (z) / (rah) is 
proportional to the time if the canonical system in Eq. ([T]) 
runs unaltered; however, adaptation of z allows the straight- 
forward adaptation of the hitting time. 

As suggested in fTOlL locally- weighted linear regression 
can be used for imitation learning. The duration of discrete 
movements is extracted using motion detection and the time- 
constants are set accordingly. Additional feedback terms can 
be added as shown in fT0lfl2ll . 

III. LEARNING THE PRIORITIZED CONTROL LAW 

By learning the prioritized control, we want to obtain a 
control law 

u = q = f(7ri,7T 2j , . . . ,7Tjv,q, q), 

i.e., we want to obtain the required control u that executes the 
primitives 7Ti, 7T2,, . . . , t^n- Here, the controls correspond to 
the joint accelerations q. The required joint accelerations not 
only depend on the primitives but also on the current state 
of the robot, i.e., the joint positions q and joint velocities 
q. Any control law can be represented locally as a linear 
control law. In our setting, these linear control laws can be 
represented as 



q 
q 



where 6 are the parameters we want to learn and (f) = 
[ q q ] acts as features. Often the actions of the 
primitive x$ can be achieved in multiple different ways due 
to the redundancies in the robot degrees of freedom. To 
ensure consistency, a null- space control is introduced. The 
null-space control can, for example, be defined to pull the 
robot towards a rest posture q , resulting in the null-space 
control 

u = -K D q - K P (q - q ) , 

where K & and K p are gains for the velocities and positions 
respectively. 

To learn the prioritized control law, we try to generalize 
the learning of the operational space control approach from 
[ 5 ] to a hierarchical control approach (TJ0- 

A. Single Primitive Control Law 

A straightforward approach to learn the motor commands 
u, represented by the linear model u = (f) T 0, is using linear 
regression. This approach minimizes the squared error 



E 2 = £(uf-^0 



t = l 

between the demonstrated control of the primitive uf f and 
the recovered linear policy = <f>JO, where T is the 
number of samples. The parameters minimizing this error 
are 

(4) 



= (V^ + Al) $ T U, 



with <3> and U containing the values of the demonstrated 4> 
and u for all time-steps t respectively, and a ridge factor A. 
If the task space and the joint-space coincide, the controls 
u = q are identical to the action of the primitive x$ . We also 
know that locally any control law that can be learned from 
data is a viable control law [5]. The error with respect to 
the training data is minimized, however, if the training data 
is not consistent, the plain linear regression will average the 
motor commands, which is unlikely to fulfill the actions of 
the primitive. 

In order to enforce consistency, the learning approach has 
to resolve the redundancy and incorporate the null- space 
control. We can achieve this by using the program 



minJ — (u — Uq) N (u — uq) 

u 

S.t.X = 7T (x, X, Z) 



(5) 



as discussed in [ 1 ]. Here the cost J is defined as the weighted 
squared difference of the control u and the null- space control 
uo, where the metric N is a positive semi-definite matrix. 
The idea is to find controls u that are as close as possible to 
the null-space control uo while still fulfilling the constraints 
of the primitive tt. This program can also be solved as 
discussed in Q. Briefly speaking, the regression in Eq. ^ 
can be made consistent by weighting down the error by 
weights w t and hence obtaining 



AI 



r 



# T WU 



(6) 



Algorithm 1 Learning the Prioritized Control Law 

define null-space control uo, metric N, scaling factor a 

collect controls and features 4> i t for all primitives i £ 
{1, . . . , N} and all time-steps t £ {1, . . . , T} separately 

for primitives i = 1 . . . N (N: highest priority) do 

for time-steps t=l...T do 

calculate offset controls 

Ui,t = Ui )t - E}=1 - u o,t 

calculate weights Wij = exp ^-au^Nu^j 
end for 

build control matrix containing Ui,i . . . u^t 
build feature matrix 3^ containing 4> i 1 . . . 4> i T 
build weight matrix Wj = diag(^ j i, . . . , w^t) 
calculate parameters 
0i= (< 

end for 
end for 



with W = diag(wi, . . . , wm) for T samples. This approach 
works well for linear models and can be gotten to work 
with multiple locally linear control laws. Nevertheless, it 
maximizes a reward instead of minimizing a cost. The cost 
J can be transformed into weights w t by passing it through 
an exponential function 



^ W,^ j Al) 1 ^W;U; 



exp (— au^Niit 



where = (u t — u ). The scaling factor a acts as a 
monotonic transformation that does not affect the optimal 
solution but can increase the efficiency of the learning 
algorithm. 

Using the Woodbury formula ifTH Eq. ([6]) can be trans- 
formed into 



(7) 



with W[/ = diag (uf Nui, . . . , u^Nu n ). By introducing 
the kernels k(s) = 0(s) T $ T and K = <3><3> T we obtain 

u = k(s) T (K + W c/ )" 1 U, 

which is related to the kernel regression fT5ll . This kernelized 
form of Eq. ([7]) overcomes the limitations of the linear model 
at a cost of higher computational complexity. 

B. Prioritized Primitives Control Law 

In the previous section, we have described how the control 
law for a single primitive can be learned. To generalize this 
approach to multiple primitives with different priorities, we 
want a control law that always fulfills the primitive with 
the highest priority and follows the remaining primitives as 



much as possible according to their place in the hierarchy. 
Our idea is to represent the higher priority control laws as 
correction term with respect to the lower priority primitives. 
The control of the primitive with the lowest priority is 
learned first. This control is subsequently considered to be a 
baseline and the primitives of higher priority only learn the 
difference to this baseline control. The change between the 
motor commands resulting from primitives of lower priority 
is minimized. The approach is reminiscent of online passive- 
aggressive algorithms fT6l . Hence, control laws of higher 
priority primitives only learn the offset between their desired 
behavior and the behavior of the lower priority primitives. 
This structure allows them to override the actions of the 
primitives of lesser priority and, therefore, add more detailed 
control in the regions of the state space they are concerned 
with. The combined control of all primitives is 



N 



uo 



where uo is the null- space control and Au n are the offset 
controls of the N primitives. 

Such control laws can be expressed by changing the 
program in Eq. ^ to 

/ i-i \ T / i-i 

min J — I — Auj — uq I N I — Auj — uo 

V 3 = 1 J V 3 = 1 

S.t. TT^X^, X^, z), 

where the primitives need to be learned in the increasing 
order of their priority, the primitive with the lowest priority is 
learned first, the primitive with the highest priority is learned 
last. The regression in Eq. ([6]) changes to 



Oi 



AI ) 



where contains the offset controls = u ijt — 

S}=i ^ u j,t — uo,t for all time-steps t, where Auj )t = 
4>Jt0j- The weighting matrix now has the weights 
w t = exp ^-au^Nu^j on its diagonal and matrix 
contains offset controls u^£. The kernelized form of the 
prioritized control law can be obtained analogously. The 
complete approach is summarized in Algorithm [T] 

IV. EVALUATION: BALL-BOUNCING 

In order to evaluate the proposed prioritized control ap- 
proach, we chose a ball bouncing task. We describe the task 
in Section |IV-A| explain a possible higher level strategy in 



Section [TV-B and discuss how the proposed framework can 



be applied in Section IV-C 



A. Task Description 

The goal of the task is to bounce a table tennis ball above a 
racket. The racket is held in the player's hand, or in our case 
attached to the end-effector of the robot. The ball is supposed 
to be kept bouncing on the racket. A possible movement is 
illustrated in Fig. [2] 




(a) Exaggerated schematic drawing. The green arrows indicate velocities. 



(b) Paddling movement for the simulated robot. The black ball represents the imagined target (see Sect. llV-Bl 
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(c) Paddling movement for the real Barrett WAM. 

Fig. 2: This figure illustrates a possible sequence of bouncing the ball on the racket in a schematic drawing, in simulation, 
and on the real robot. 



It is desirable to stabilize the bouncing movement to a 
strictly vertical bounce, hence, avoiding the need of the 
player to move a lot in space and, thus, leaving the work 
space of the robot. The hitting height is a trade-off between 
having more time until the next hit at the expense of the 
next hitting position possibly being further away. The task 
can be sub-dived into three intuitive primitives: hitting the 
ball upward, moving the racket under the ball before hitting, 
and changing the orientation of the racket to move the ball 
to a desired location. A possible strategy is outlined in the 
next section. 

The ball is tracked using a stereo vision setup and its 
positions and velocities are estimated by a Kalman filter. To 
initialize the ball-bouncing task, the ball is thrown towards 
the racket. 

B. Bouncing Strategy 

The strategy employed to achieve the desired bouncing 
behavior is based on an imagined target that indicates the 
desired bouncing height. This target is above the default 
posture of the racket. The top point of the ball trajectory 
is supposed to hit this target, and the stable behavior should 
be a strictly vertical bounce. This behavior can be achieved 
by defining a hitting plane, i.e., a height at which the ball 



is always hit (which corresponds to the default posture of 
the racket). On this hitting plane, the ball is always hit in 
a manner that the top point of its trajectory corresponds to 
the height of the target and the next intersection of the ball 
trajectory with the hitting plane is directly under the target. 
See Fig. [3] for an illustration. 

To achieve this desired ball behavior, the racket is always 
moved to the intersection point of the ball trajectory and 
the hitting plane. By choosing the hitting velocity and the 
orientation of the racket, the velocity and direction of the ball 
after being hit can be changed. The required hitting velocity 
and orientation are calculated using a model of the ball and 
the racket. The ball is modeled as a point mass that moves 
according to the ballistic flight equations. For the relatively 
low speeds and small distances air resistance is negligible. 
The contact with the racket is modeled as a reflection with 
a restitution factor. 

Using this strategy the ball can be brought back to a strictly 
vertical bouncing behavior with a single hit. However, this 
method requires the knowledge of the ball position and ve- 
locity, as well as a model of the ball behavior. An alternative 
strategy that stabilizes the behavior in a completely open loop 
behavior employs a slightly concave paddle shape 11711 . A 




Fig. 3: This figure illustrates the employed strategy for 
bouncing the ball on the racket. The highest point of the 
ball trajectory is supposed to coincide with the red target. 
The racket is always hitting the ball in a fixed height, i.e., 
the hitting plane. The strategy is to play the ball in a way that 
the next intersection with the hitting plane is directly below 
the target and the maximum height of the ball trajectory 
corresponds to the height of the target. If the bounce works 
exactly as planned, the ball needs to be hit only once to 
return to a strictly vertical bouncing behavior. 



method similar to the proposed strategy has been employed 
bvfT8l [T9lL and l20l proposed the mirror law for this task. 
The ball bouncing task has also be employed to study how 
humans stabilize a rhythmic task l2T1l . 

C. Learning Results 

As discussed in Section |IV-A| the task can be described 
by three primitives: "move under the ball", "hit the ball", 
and "change racket orientation". Training data is collected 
in the relevant state space independently for each primitive. 
For doing so, the parameters corresponding to the other 
primitives are kept fixed and variants of the primitive are 
hence executed from various different starting positions. The 
primitive "move under the ball" corresponds to movements 
in the horizontal plane, the primitive "hit the ball" to up 
and down movements, and the primitive "change racket 
orientation" only changes the orientation of the end-effector. 
We collected 30 seconds of training data for each primitive, 
corresponding to approximately 60 bounces. 

Having only three primitives allows it to enumerate all 
six possible dominance structures, to learn the corresponding 
prioritized control law, and to evaluate the controller. As 
intuitive quality measure we counted the number of bounces 
until the robot missed, either due to imprecise control or due 
to the ball being outside of the safely reachable work-space. 

Table |T| illustrates the resulting dominance structures. The 
most relevant primitive is the "hit the ball" primitive, fol- 
lowed by the "move under the ball" primitive. In the table it 
is clearly visible that inverting the order of two neighboring 
primitives that are in the preferred dominance order always 
results in a lower number of hits. Compared to a single 
model, that was trained using the combined training data 
of the three primitives, all but two prioritized control laws 
work significantly better. The ordering may appear slightly 



Dominance Structure 


Numbe 
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single model 


5.70 ± 0.73 


1.10 ± 0.99 


hit y move y orient 


11.35 ± 2.16 


2.30 ± 0.67 


hit y orient y move 


10.85 ±1.46 


1.70 ±0.95 


move y hit y orient 


9.05 ±0.76 


1.40 ±0.70 


move y orient y hit 


7.75 ± 1.48 


1.40 ±0.84 


orient y hit y move 


5.90 ±0.85 


1.30 ±0.67 


orient y mo ve y hit 


5.35 ±0.49 


1.30 ±0.48 



TABLE I: This table shows the suitability of the possible 
dominance structures (mean±std). The "hit the ball" prim- 
itive clearly is the dominant one, followed by the "move 
under the ball" primitive. The prioritized control laws work 
significantly better than a single model learned using the 
combined training data of the three primitives. Preliminary 
results on the real robot confirm this ordering. 



counter-intuitive as moving under the ball seems to be the 
most important primitive in order to keep the ball in the 
air, allowing for later corrections. However, the robot has a 
fixed base position and the ball moves quickly out of the 
safely reachable work-space, resulting in a low number of 
hits. Additionally, the default position of the racket is almost 
vertical, hence covering a fairly large area of the horizontal 
plane resulting in robustness with respect to errors in this 
primitive. 

V. CONCLUSION 

In this paper, we have presented a prioritized control 
learning approach that is based on the superposition of move- 
ment primitives. We have introduced a novel framework for 
learning prioritized control. The controls of the lower priority 
primitives are fulfilled as long as they lay in the null- space 
of the higher priority ones and get overridden otherwise. As 
representation for the primitives, we employ the dynamical 
systems motor primitives |[T0l[TTl , which yield controls in the 
form of desired accelerations. These primitives are executed 
separately to collect training data. Local linear models are 
trained using a weighted regression technique incorporating 
the various possible dominance structures. In the presented 
ball bouncing task, the movement is restricted to a space 
where the controls are approximately linear. Hence, a single 
linear model per primitive was sufficient. This limitation 
can be overcome by either considering multiple local linear 
models or by kernelizing the weighted regression, as 
described in Sect. UlLAl and UlLBl 

The dominance structure of the task was determined by 
testing all possible structures exhaustively. Intuitively, the 
lower priority primitives represent a global behavior and 
the high priority primitives represent specialized corrections, 
hence overriding the lower priority controls. In most cases, 
the resulting prioritized control works significantly better 
than a single layer one that was trained with the com- 
bined training data of all primitives. As illustrated by the 
evaluations, the dominance structure can have a significant 
influence on the global success of the prioritized control. 
Enumerating all possible dominance structures is factorial in 



the number of primitives and hence unfeasible in practice 
for more than four primitives. In this case, smarter search 
strategies are needed. 

The success of the different dominance structures not only 
depends on the task but also on the employed strategy of 
activating and adapting the different primitives. An interest- 
ing area for future research could be to jointly learn the 
prioritized control and the strategy. 

The presented approach has been evaluated both in sim- 
ulation and on a real Barrett WAM and we have demon- 
strated that our novel approach can successfully learn a ball- 
bouncing task. 
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